Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 122
Filter
1.
Cell Genom ; 4(4): 100539, 2024 Apr 10.
Article in English | MEDLINE | ID: mdl-38604127

ABSTRACT

Polygenic risk scores (PRSs) are now showing promising predictive performance on a wide variety of complex traits and diseases, but there exists a substantial performance gap across populations. We propose MUSSEL, a method for ancestry-specific polygenic prediction that borrows information in summary statistics from genome-wide association studies (GWASs) across multiple ancestry groups via Bayesian hierarchical modeling and ensemble learning. In our simulation studies and data analyses across four distinct studies, totaling 5.7 million participants with a substantial ancestral diversity, MUSSEL shows promising performance compared to alternatives. For example, MUSSEL has an average gain in prediction R2 across 11 continuous traits of 40.2% and 49.3% compared to PRS-CSx and CT-SLEB, respectively, in the African ancestry population. The best-performing method, however, varies by GWAS sample size, target ancestry, trait architecture, and linkage disequilibrium reference samples; thus, ultimately a combination of methods may be needed to generate the most robust PRSs across diverse populations.


Subject(s)
Bivalvia , Multifactorial Inheritance , Humans , Animals , Multifactorial Inheritance/genetics , Genome-Wide Association Study/methods , Bayes Theorem , Phenotype , Genetic Risk Score
2.
Am J Hum Genet ; 111(1): 11-23, 2024 Jan 04.
Article in English | MEDLINE | ID: mdl-38181729

ABSTRACT

Precision medicine initiatives across the globe have led to a revolution of repositories linking large-scale genomic data with electronic health records, enabling genomic analyses across the entire phenome. Many of these initiatives focus solely on research insights, leading to limited direct benefit to patients. We describe the biobank at the Colorado Center for Personalized Medicine (CCPM Biobank) that was jointly developed by the University of Colorado Anschutz Medical Campus and UCHealth to serve as a unique, dual-purpose research and clinical resource accelerating personalized medicine. This living resource currently has more than 200,000 participants with ongoing recruitment. We highlight the clinical, laboratory, regulatory, and HIPAA-compliant informatics infrastructure along with our stakeholder engagement, consent, recontact, and participant engagement strategies. We characterize aspects of genetic and geographic diversity unique to the Rocky Mountain region, the primary catchment area for CCPM Biobank participants. We leverage linked health and demographic information of the CCPM Biobank participant population to demonstrate the utility of the CCPM Biobank to replicate complex trait associations in the first 33,674 genotyped individuals across multiple disease domains. Finally, we describe our current efforts toward return of clinical genetic test results, including high-impact pathogenic variants and pharmacogenetic information, and our broader goals as the CCPM Biobank continues to grow. Bringing clinical and research interests together fosters unique clinical and translational questions that can be addressed from the large EHR-linked CCPM Biobank resource within a HIPAA- and CLIA-certified environment.


Subject(s)
Learning Health System , Precision Medicine , Humans , Biological Specimen Banks , Colorado , Genomics
3.
J Clin Endocrinol Metab ; 109(2): 402-412, 2024 Jan 18.
Article in English | MEDLINE | ID: mdl-37683082

ABSTRACT

CONTEXT: Thyroid nodule ultrasound-based risk stratification schemas rely on the presence of high-risk sonographic features. However, some malignant thyroid nodules have benign appearance on thyroid ultrasound. New methods for thyroid nodule risk assessment are needed. OBJECTIVE: We investigated polygenic risk score (PRS) accounting for inherited thyroid cancer risk combined with ultrasound-based analysis for improved thyroid nodule risk assessment. METHODS: The convolutional neural network classifier was trained on thyroid ultrasound still images and cine clips from 621 thyroid nodules. Phenome-wide association study (PheWAS) and PRS PheWAS were used to optimize PRS for distinguishing benign and malignant nodules. PRS was evaluated in 73 346 participants in the Colorado Center for Personalized Medicine Biobank. RESULTS: When the deep learning model output was combined with thyroid cancer PRS and genetic ancestry estimates, the area under the receiver operating characteristic curve (AUROC) of the benign vs malignant thyroid nodule classifier increased from 0.83 to 0.89 (DeLong, P value = .007). The combined deep learning and genetic classifier achieved a clinically relevant sensitivity of 0.95, 95% CI [0.88-0.99], specificity of 0.63 [0.55-0.70], and positive and negative predictive values of 0.47 [0.41-0.58] and 0.97 [0.92-0.99], respectively. AUROC improvement was consistent in European ancestry-stratified analysis (0.83 and 0.87 for deep learning and deep learning combined with PRS classifiers, respectively). Elevated PRS was associated with a greater risk of thyroid cancer structural disease recurrence (ordinal logistic regression, P value = .002). CONCLUSION: Augmenting ultrasound-based risk assessment with PRS improves diagnostic accuracy.


Subject(s)
Thyroid Neoplasms , Thyroid Nodule , Humans , Thyroid Nodule/diagnostic imaging , Thyroid Nodule/genetics , Sensitivity and Specificity , Neoplasm Recurrence, Local , Thyroid Neoplasms/diagnostic imaging , Thyroid Neoplasms/genetics , Ultrasonography/methods
4.
mSystems ; 9(1): e0067723, 2024 Jan 23.
Article in English | MEDLINE | ID: mdl-38095449

ABSTRACT

Inflammatory bowel disease (IBD) is characterized by complex etiology and a disrupted colonic ecosystem. We provide a framework for the analysis of multi-omic data, which we apply to study the gut ecosystem in IBD. Specifically, we train and validate models using data on the metagenome, metatranscriptome, virome, and metabolome from the Human Microbiome Project 2 IBD multi-omic database, with 1,785 repeated samples from 130 individuals (103 cases and 27 controls). After splitting the participants into training and testing groups, we used mixed-effects least absolute shrinkage and selection operator regression to select features for each omic. These features, with demographic covariates, were used to generate separate single-omic prediction scores. All four single-omic scores were then combined into a final regression to assess the relative importance of the individual omics and the predictive benefits when considered together. We identified several species, pathways, and metabolites known to be associated with IBD risk, and we explored the connections between data sets. Individually, metabolomic and viromic scores were more predictive than metagenomics or metatranscriptomics, and when all four scores were combined, we predicted disease diagnosis with a Nagelkerke's R2 of 0.46 and an area under the curve of 0.80 (95% confidence interval: 0.63, 0.98). Our work supports that some single-omic models for complex traits are more predictive than others, that incorporating multiple omic data sets may improve prediction, and that each omic data type provides a combination of unique and redundant information. This modeling framework can be extended to other complex traits and multi-omic data sets.IMPORTANCEComplex traits are characterized by many biological and environmental factors, such that multi-omic data sets are well-positioned to help us understand their underlying etiologies. We applied a prediction framework across multiple omics (metagenomics, metatranscriptomics, metabolomics, and viromics) from the gut ecosystem to predict inflammatory bowel disease (IBD) diagnosis. The predicted scores from our models highlighted key features and allowed us to compare the relative utility of each omic data set in single-omic versus multi-omic models. Our results emphasized the importance of metabolomics and viromics over metagenomics and metatranscriptomics for predicting IBD status. The greater predictive capability of metabolomics and viromics is likely because these omics serve as markers of lifestyle factors such as diet. This study provides a modeling framework for multi-omic data, and our results show the utility of combining multiple omic data types to disentangle complex disease etiologies and biological signatures.


Subject(s)
Inflammatory Bowel Diseases , Microbiota , Humans , Inflammatory Bowel Diseases/diagnosis , Metagenomics/methods , Phenotype , Risk Factors
5.
J Community Genet ; 14(6): 543-553, 2023 Dec.
Article in English | MEDLINE | ID: mdl-37962783

ABSTRACT

Genome-wide association studies (GWAS) have allowed the identification of disease-associated variants, which can be leveraged to build polygenic scores (PGSs). Even though PGSs can be a valuable tool in personalized medicine, their predictive power is limited in populations of non-European ancestry, particularly in admixed populations. Recent efforts have focused on increasing racial and ethnic diversity in GWAS, thus, addressing some of the limitations of genetic risk prediction in these populations. Even with these efforts, few studies focus exclusively on Hispanics/Latinos. Additionally, Hispanic/Latino populations are often considered a single population despite varying admixture proportions between and within ethnic groups, diverse genetic heterogeneity, and demographic history. Combined with highly heterogeneous environmental and socioeconomic exposures, this diversity can reduce the transferability of genetic risk prediction models. Given the recent increase of genomic studies that include Hispanics/Latinos, we review the milestones and efforts that focus on genetic risk prediction, summarize the potential for improving PGS transferability, and highlight the challenges yet to be addressed. Additionally, we summarize social-ethical considerations and provide ideas to promote genetic risk prediction models that can be implemented equitably.

6.
Am J Hum Genet ; 110(11): 1853-1862, 2023 11 02.
Article in English | MEDLINE | ID: mdl-37875120

ABSTRACT

The heritability explained by local ancestry markers in an admixed population (hγ2) provides crucial insight into the genetic architecture of a complex disease or trait. Estimation of hγ2 can be susceptible to biases due to population structure in ancestral populations. Here, we present heritability estimation from admixture mapping summary statistics (HAMSTA), an approach that uses summary statistics from admixture mapping to infer heritability explained by local ancestry while adjusting for biases due to ancestral stratification. Through extensive simulations, we demonstrate that HAMSTA hγ2 estimates are approximately unbiased and are robust to ancestral stratification compared to existing approaches. In the presence of ancestral stratification, we show a HAMSTA-derived sampling scheme provides a calibrated family-wise error rate (FWER) of ∼5% for admixture mapping, unlike existing FWER estimation approaches. We apply HAMSTA to 20 quantitative phenotypes of up to 15,988 self-reported African American individuals in the Population Architecture using Genomics and Epidemiology (PAGE) study. We observe hˆγ2 in the 20 phenotypes range from 0.0025 to 0.033 (mean hˆγ2 = 0.012 ± 9.2 × 10-4), which translates to hˆ2 ranging from 0.062 to 0.85 (mean hˆ2 = 0.30 ± 0.023). Across these phenotypes we find little evidence of inflation due to ancestral population stratification in current admixture mapping studies (mean inflation factor of 0.99 ± 0.001). Overall, HAMSTA provides a fast and powerful approach to estimate genome-wide heritability and evaluate biases in test statistics of admixture mapping studies.


Subject(s)
Black or African American , Genetics, Population , Humans , Chromosome Mapping , Phenotype , Polymorphism, Single Nucleotide/genetics
7.
Nature ; 622(7984): 775-783, 2023 Oct.
Article in English | MEDLINE | ID: mdl-37821706

ABSTRACT

Latin America continues to be severely underrepresented in genomics research, and fine-scale genetic histories and complex trait architectures remain hidden owing to insufficient data1. To fill this gap, the Mexican Biobank project genotyped 6,057 individuals from 898 rural and urban localities across all 32 states in Mexico at a resolution of 1.8 million genome-wide markers with linked complex trait and disease information creating a valuable nationwide genotype-phenotype database. Here, using ancestry deconvolution and inference of identity-by-descent segments, we inferred ancestral population sizes across Mesoamerican regions over time, unravelling Indigenous, colonial and postcolonial demographic dynamics2-6. We observed variation in runs of homozygosity among genomic regions with different ancestries reflecting distinct demographic histories and, in turn, different distributions of rare deleterious variants. We conducted genome-wide association studies (GWAS) for 22 complex traits and found that several traits are better predicted using the Mexican Biobank GWAS compared to the UK Biobank GWAS7,8. We identified genetic and environmental factors associating with trait variation, such as the length of the genome in runs of homozygosity as a predictor for body mass index, triglycerides, glucose and height. This study provides insights into the genetic histories of individuals in Mexico and dissects their complex trait architectures, both crucial for making precision and preventive medicine initiatives accessible worldwide.


Subject(s)
Biological Specimen Banks , Genetics, Medical , Genome, Human , Genomics , Hispanic or Latino , Humans , Blood Glucose/genetics , Blood Glucose/metabolism , Body Height/genetics , Body Mass Index , Gene-Environment Interaction , Genetic Markers/genetics , Genome-Wide Association Study , Hispanic or Latino/classification , Hispanic or Latino/genetics , Homozygote , Mexico , Phenotype , Triglycerides/blood , Triglycerides/genetics , United Kingdom , Genome, Human/genetics
8.
Hum Genet ; 142(10): 1477-1489, 2023 Oct.
Article in English | MEDLINE | ID: mdl-37658231

ABSTRACT

Inadequate representation of non-European ancestry populations in genome-wide association studies (GWAS) has limited opportunities to isolate functional variants. Fine-mapping in multi-ancestry populations should improve the efficiency of prioritizing variants for functional interrogation. To evaluate this hypothesis, we leveraged ancestry architecture to perform comparative GWAS and fine-mapping of obesity-related phenotypes in European ancestry populations from the UK Biobank (UKBB) and multi-ancestry samples from the Population Architecture for Genetic Epidemiology (PAGE) consortium with comparable sample sizes. In the investigated regions with genome-wide significant associations for obesity-related traits, fine-mapping in our ancestrally diverse sample led to 95% and 99% credible sets (CS) with fewer variants than in the European ancestry sample. Lead fine-mapped variants in PAGE regions had higher average coding scores, and higher average posterior probabilities for causality compared to UKBB. Importantly, 99% CS in PAGE loci contained strong expression quantitative trait loci (eQTLs) in adipose tissues or harbored more variants in tighter linkage disequilibrium (LD) with eQTLs. Leveraging ancestrally diverse populations with heterogeneous ancestry architectures, coupled with functional annotation, increased fine-mapping efficiency and performance, and reduced the set of candidate variants for consideration for future functional studies. Significant overlap in genetic causal variants across populations suggests generalizability of genetic mechanisms underpinning obesity-related traits across populations.


Subject(s)
Genome-Wide Association Study , Obesity , Humans , Molecular Epidemiology , Linkage Disequilibrium , Obesity/genetics , Quantitative Trait Loci/genetics
9.
Front Genet ; 14: 1181167, 2023.
Article in English | MEDLINE | ID: mdl-37600667

ABSTRACT

Peripheral artery disease (PAD) is a form of atherosclerotic cardiovascular disease, affecting ∼8 million Americans, and is known to have racial and ethnic disparities. PAD has been reported to have a significantly higher prevalence in African Americans (AAs) compared to non-Hispanic European Americans (EAs). Hispanic/Latinos (HLs) have been reported to have lower or similar rates of PAD compared to EAs, despite having a paradoxically high burden of PAD risk factors; however, recent work suggests prevalence may differ between sub-groups. Here, we examined a large cohort of diverse adults in the BioMe biobank in New York City. We observed the prevalence of PAD at 1.7% in EAs vs. 8.5% and 9.4% in AAs and HLs, respectively, and among HL sub-groups, the prevalence was found at 11.4% and 11.5% in Puerto Rican and Dominican populations, respectively. Follow-up analysis that adjusted for common risk factors demonstrated that Dominicans had the highest increased risk for PAD relative to EAs [OR = 3.15 (95% CI 2.33-4.25), p < 6.44 × 10-14]. To investigate whether genetic factors may explain this increased risk, we performed admixture mapping by testing the association between local ancestry and PAD in Dominican BioMe participants (N = 1,813) separately from European, African, and Native American (NAT) continental ancestry tracts. The top association with PAD was an NAT ancestry tract at chromosome 2q35 [OR = 1.96 (SE = 0.16), p < 2.75 × 10-05) with 22.6% vs. 12.9% PAD prevalence in heterozygous NAT tract carriers versus non-carriers, respectively. Fine-mapping at this locus implicated tag SNP rs78529201 located within a long intergenic non-coding RNA (lincRNA) LINC00607, a gene expression regulator of key genes related to thrombosis and extracellular remodeling of endothelial cells, suggesting a putative link of the 2q35 locus to PAD etiology. Efforts to reproduce the signal in other Hispanic cohorts were unsuccessful. In summary, we showed how leveraging health system data helped understand nuances of PAD risk across HL sub-groups and admixture mapping approaches elucidated a putative risk locus in a Dominican population.

10.
Open Heart ; 10(2)2023 08.
Article in English | MEDLINE | ID: mdl-37648373

ABSTRACT

INTRODUCTION: The independent and causal cardiovascular disease risk factor lipoprotein(a) (Lp(a)) is elevated in >1.5 billion individuals worldwide, but studies have prioritised European populations. METHODS: Here, we examined how ancestrally diverse studies could clarify Lp(a)'s genetic architecture, inform efforts examining application of Lp(a) polygenic risk scores (PRS), enable causal inference and identify unexpected Lp(a) phenotypic effects using data from African (n=25 208), East Asian (n=2895), European (n=362 558), South Asian (n=8192) and Hispanic/Latino (n=8946) populations. RESULTS: Fourteen genome-wide significant loci with numerous population specific signals of large effect were identified that enabled construction of Lp(a) PRS of moderate (R2=15% in East Asians) to high (R2=50% in Europeans) accuracy. For all populations, PRS showed promise as a 'rule out' for elevated Lp(a) because certainty of assignment to the low-risk threshold was high (88.0%-99.9%) across PRS thresholds (80th-99th percentile). Causal effects of increased Lp(a) with increased glycated haemoglobin were estimated for Europeans (p value =1.4×10-6), although inverse effects in Africans and East Asians suggested the potential for heterogeneous causal effects. Finally, Hispanic/Latinos were the only population in which known associations with coronary atherosclerosis and ischaemic heart disease were identified in external testing of Lp(a) PRS phenotypic effects. CONCLUSIONS: Our results emphasise the merits of prioritising ancestral diversity when addressing Lp(a) evidence gaps.


Subject(s)
Coronary Artery Disease , Myocardial Ischemia , Humans , Lipoprotein(a)/genetics , Evidence Gaps , Risk Factors , Coronary Artery Disease/diagnosis , Coronary Artery Disease/epidemiology , Coronary Artery Disease/genetics
11.
Nat Med ; 29(7): 1845-1856, 2023 07.
Article in English | MEDLINE | ID: mdl-37464048

ABSTRACT

An individual's disease risk is affected by the populations that they belong to, due to shared genetics and environmental factors. The study of fine-scale populations in clinical care is important for identifying and reducing health disparities and for developing personalized interventions. To assess patterns of clinical diagnoses and healthcare utilization by fine-scale populations, we leveraged genetic data and electronic medical records from 35,968 patients as part of the UCLA ATLAS Community Health Initiative. We defined clusters of individuals using identity by descent, a form of genetic relatedness that utilizes shared genomic segments arising due to a common ancestor. In total, we identified 376 clusters, including clusters with patients of Afro-Caribbean, Puerto Rican, Lebanese Christian, Iranian Jewish and Gujarati ancestry. Our analysis uncovered 1,218 significant associations between disease diagnoses and clusters and 124 significant associations with specialty visits. We also examined the distribution of pathogenic alleles and found 189 significant alleles at elevated frequency in particular clusters, including many that are not regularly included in population screening efforts. Overall, this work progresses the understanding of health in understudied communities and can provide the foundation for further study into health inequities.


Subject(s)
Delivery of Health Care , Patient Acceptance of Health Care , Humans , Los Angeles , Iran , Ethnicity
12.
bioRxiv ; 2023 Apr 18.
Article in English | MEDLINE | ID: mdl-37131817

ABSTRACT

The heritability explained by local ancestry markers in an admixed population hγ2 provides crucial insight into the genetic architecture of a complex disease or trait. Estimation of hγ2 can be susceptible to biases due to population structure in ancestral populations. Here, we present a novel approach, Heritability estimation from Admixture Mapping Summary STAtistics (HAMSTA), which uses summary statistics from admixture mapping to infer heritability explained by local ancestry while adjusting for biases due to ancestral stratification. Through extensive simulations, we demonstrate that HAMSTA hγ2 estimates are approximately unbiased and are robust to ancestral stratification compared to existing approaches. In the presence of ancestral stratification, we show a HAMSTA-derived sampling scheme provides a calibrated family-wise error rate (FWER) of ~5% for admixture mapping, unlike existing FWER estimation approaches. We apply HAMSTA to 20 quantitative phenotypes of up to 15,988 self-reported African American individuals in the Population Architecture using Genomics and Epidemiology (PAGE) study. We observe hˆγ2 in the 20 phenotypes range from 0.0025 to 0.033 (mean hˆγ2=0.012+/-9.2×10-4), which translates to hˆ2 ranging from 0.062 to 0.85 (mean hˆ2=0.30+/-0.023). Across these phenotypes we find little evidence of inflation due to ancestral population stratification in current admixture mapping studies (mean inflation factor of 0.99 +/- 0.001). Overall, HAMSTA provides a fast and powerful approach to estimate genome-wide heritability and evaluate biases in test statistics of admixture mapping studies.

13.
Nat Genet ; 55(6): 952-963, 2023 06.
Article in English | MEDLINE | ID: mdl-37231098

ABSTRACT

We explored ancestry-related differences in the genetic architecture of whole-blood gene expression using whole-genome and RNA sequencing data from 2,733 African Americans, Puerto Ricans and Mexican Americans. We found that heritability of gene expression significantly increased with greater proportions of African genetic ancestry and decreased with higher proportions of Indigenous American ancestry, reflecting the relationship between heterozygosity and genetic variance. Among heritable protein-coding genes, the prevalence of ancestry-specific expression quantitative trait loci (anc-eQTLs) was 30% in African ancestry and 8% for Indigenous American ancestry segments. Most anc-eQTLs (89%) were driven by population differences in allele frequency. Transcriptome-wide association analyses of multi-ancestry summary statistics for 28 traits identified 79% more gene-trait associations using transcriptome prediction models trained in our admixed population than models trained using data from the Genotype-Tissue Expression project. Our study highlights the importance of measuring gene expression across large and ancestrally diverse populations for enabling new discoveries and reducing disparities.


Subject(s)
Black or African American , Hispanic or Latino , Mexican Americans , Humans , Black or African American/genetics , Genome-Wide Association Study , Hispanic or Latino/genetics , Mexican Americans/genetics , Phenotype , Polymorphism, Single Nucleotide , Transcriptome
14.
bioRxiv ; 2023 Sep 21.
Article in English | MEDLINE | ID: mdl-37090648

ABSTRACT

Polygenic risk scores (PRS) are now showing promising predictive performance on a wide variety of complex traits and diseases, but there exists a substantial performance gap across different populations. We propose MUSSEL, a method for ancestry-specific polygenic prediction that borrows information in the summary statistics from genome-wide association studies (GWAS) across multiple ancestry groups. MUSSEL conducts Bayesian hierarchical modeling under a MUltivariate Spike-and-Slab model for effect-size distribution and incorporates an Ensemble Learning step using super learner to combine information across different tuning parameter settings and ancestry groups. In our simulation studies and data analyses of 16 traits across four distinct studies, totaling 5.7 million participants with a substantial ancestral diversity, MUSSEL shows promising performance compared to alternatives. The method, for example, has an average gain in prediction R2 across 11 continuous traits of 40.2% and 49.3% compared to PRS-CSx and CT-SLEB, respectively, in the African Ancestry population. The best-performing method, however, varies by GWAS sample size, target ancestry, underlying trait architecture, and the choice of reference samples for LD estimation, and thus ultimately, a combination of methods may be needed to generate the most robust PRS across diverse populations.

15.
medRxiv ; 2023 Mar 29.
Article in English | MEDLINE | ID: mdl-37034679

ABSTRACT

Peripheral artery disease (PAD) is a form of atherosclerotic cardiovascular disease, affecting ∼8 million Americans, and is known to have racial and ethnic disparities. PAD has been reported to have significantly higher prevalence in African Americans (AAs) compared to non-Hispanic European Americans (EAs). Hispanic/Latinos (HLs) have been reported to have lower or similar rates of PAD compared to EAs, despite having a paradoxically high burden of PAD risk factors, however recent work suggests prevalence may differ between sub-groups. Here we examined a large cohort of diverse adults in the Bio Me biobank in New York City (NYC). We observed the prevalence of PAD at 1.7% in EAs vs 8.5% and 9.4% in AAs and HLs, respectively; and among HL sub-groups, at 11.4% and 11.5% in Puerto Rican and Dominican populations, respectively. Follow-up analysis that adjusted for common risk factors demonstrated that Dominicans had the highest increased risk for PAD relative to EAs (OR=3.15 (95% CI 2.33-4.25), P <6.44×10 -14 ). To investigate whether genetic factors may explain this increased risk, we performed admixture mapping by testing the association between local ancestry (LA) and PAD in Dominican Bio Me participants (N=1,940) separately for European (EUR), African (AFR) and Native American (NAT) continental ancestry tracts. We identified a NAT ancestry tract at chromosome 2q35 that was significantly associated with PAD (OR=2.05 (95% CI 1.51-2.78), P <4.06×10 -6 ) with 22.5% vs 12.5% PAD prevalence in heterozygous NAT tract carriers versus non-carriers, respectively. Fine-mapping at this locus implicated tag SNP rs78529201 located within a long intergenic non-coding RNA (lincRNA) LINC00607 , a gene expression regulator of key genes related to thrombosis and extracellular remodeling of endothelial cells, suggesting a putative link of the 2q35 locus to PAD etiology. In summary, we showed how leveraging health systems data helped understand nuances of PAD risk across HL sub-groups and admixture mapping approaches elucidated a novel risk locus in a Dominican population.

17.
Article in English | MEDLINE | ID: mdl-36767733

ABSTRACT

Over 6.37 million people have died from COVID-19 worldwide, but factors influencing COVID-19-related mortality remain understudied. We aimed to describe and identify risk factors for COVID-19 mortality in the Colorado Center for Personalized Medicine (CCPM) Biobank using integrated data sources, including Electronic Health Records (EHRs). We calculated cause-specific mortality and case-fatality rates for COVID-19 and common pre-existing health conditions defined by diagnostic phecodes and encounters in EHRs. We performed multivariable logistic regression analyses of the association between each pre-existing condition and COVID-19 mortality. Of the 155,859 Biobank participants enrolled as of July 2022, 20,797 had been diagnosed with COVID-19. Of 5334 Biobank participants who had died, 190 were attributed to COVID-19. The case-fatality rate was 0.91% and the COVID-19 mortality rate was 122 per 100,000 persons. The odds of dying from COVID-19 were significantly increased among older men, and those with 14 of the 61 pre-existing conditions tested, including hypertensive chronic kidney disease (OR: 10.14, 95% CI: 5.48, 19.16) and type 2 diabetes with renal manifestations (OR: 5.59, 95% CI: 3.42, 8.97). Male patients who are older and have pre-existing kidney diseases may be at higher risk for death from COVID-19 and may require special care.


Subject(s)
COVID-19 , Diabetes Mellitus, Type 2 , Humans , Male , Aged , Diabetes Mellitus, Type 2/epidemiology , SARS-CoV-2 , Colorado/epidemiology , Biological Specimen Banks , Precision Medicine , Risk Factors
18.
Pac Symp Biocomput ; 28: 121-132, 2023.
Article in English | MEDLINE | ID: mdl-36540970

ABSTRACT

Groups of distantly related individuals who share a short segment of their genome identical-by-descent (IBD) can provide insights about rare traits and diseases in massive biobanks using IBD mapping. Clustering algorithms play an important role in finding these groups accurately and at scale. We set out to analyze the fitness of commonly used, fast and scalable clustering algorithms for IBD mapping applications. We designed a realistic benchmark for local IBD graphs and utilized it to compare the statistical power of clustering algorithms via simulating 2.3 million clusters across 850 experiments. We found Infomap and Markov Clustering (MCL) community detection methods to have high statistical power in most of the scenarios. They yield a 30% increase in power compared to the current state-of-art approach, with a 3 orders of magnitude lower runtime. We also found that standard clustering metrics, such as modularity, cannot predict statistical power of algorithms in IBD mapping applications. We extend our findings to real datasets by analyzing the Population Architecture using Genomics and Epidemiology (PAGE) Study dataset with 51,000 samples and 2 million shared segments on Chromosome 1, resulting in the extraction of 39 million local IBD clusters. We demonstrate the power of our approach by recovering signals of rare genetic variation in the Whole-Exome Sequence data of 200,000 individuals in the UK Biobank. We provide an efficient implementation to enable clustering at scale for IBD mapping for various populations and scenarios.Supplementary Information: The code, along with supplementary methods and figures are available at https://github.com/roohy/localIBDClustering.


Subject(s)
Algorithms , Computational Biology , Humans , Genomics , Cluster Analysis
19.
PLoS One ; 17(10): e0274050, 2022.
Article in English | MEDLINE | ID: mdl-36194597

ABSTRACT

Since the initial reported discovery of SARS-CoV-2 in late 2019, genomic surveillance has been an important tool to understand its transmission and evolution. Here, we sought to describe the underlying regional phylodynamics before and during a rapid spreading event that was documented by surveillance protocols of the United States Air Force Academy (USAFA) in late October-November of 2020. We used replicate long-read sequencing on Colorado SARS-CoV-2 genomes collected July through November 2020 at the University of Colorado Anschutz Medical campus in Aurora and the United States Air Force Academy in Colorado Springs. Replicate sequencing allowed rigorous validation of variation and placement in a phylogenetic relatedness network. We focus on describing the phylodynamics of a lineage that likely originated in the local Colorado Springs community and expanded rapidly over the course of two months in an outbreak within the well-controlled environment of the United States Air Force Academy. Divergence estimates from sampling dates indicate that the SARS-CoV-2 lineage associated with this rapid expansion event originated in late October 2020. These results are in agreement with transmission pathways inferred by the United States Air Force Academy, and provide a window into the evolutionary process and transmission dynamics of a potentially dangerous but ultimately contained variant.


Subject(s)
COVID-19 , SARS-CoV-2 , COVID-19/epidemiology , Colorado/epidemiology , Genome, Viral , Humans , Phylogeny , SARS-CoV-2/genetics
20.
Hum Genomics ; 16(1): 27, 2022 07 27.
Article in English | MEDLINE | ID: mdl-35897116

ABSTRACT

RT-PCR is the foremost clinical test for diagnosis of COVID-19. Unfortunately, PCR-based testing has limitations and may not result in a positive test early in the course of infection before symptoms develop. Enveloped RNA viruses, such as coronaviruses, alter peripheral blood methylation and DNA methylation signatures may characterize asymptomatic versus symptomatic infection. We used Illumina's Infinium MethylationEPIC BeadChip array to profile peripheral blood samples from 164 patients who tested positive for SARS-CoV-2 by RT-PCR, of whom 8 had no symptoms. Epigenome-wide association analysis identified 10 methylation sites associated with infection and a quantile-quantile plot showed little inflation. These preliminary results suggest that differences in methylation patterns may distinguish asymptomatic from symptomatic infection.


Subject(s)
COVID-19 , COVID-19/genetics , Epigenesis, Genetic , Epigenomics , Humans , SARS-CoV-2/genetics
SELECTION OF CITATIONS
SEARCH DETAIL
...